UCT: Derived memh #10332

iyastreb · 2024-11-26T10:31:14Z

What?

This PR is a composite part of the new memory invalidation design based on the concept of memory windows.
Here we introduce changes on the UCT side.
On UCT side memory window represents a shallow copy of some existing UCT memory handle, which can be used to access the same memory region. When created, the memory window inherits the original memh access flags and state, and takes ownership of the indirect keys of the original memory handle. The lifetime of the memory window is bound to the original memh, and the original memh cannot be destroyed until all its memory windows are destroyed.

Design doc

Why?

The overall idea is to employ a lightweight invalidation: when failure happens we invalidate only the exposed indirect rkeys, but don't deregister the entire memory region, which is an expensive operation. Lightweight invalidation enables invalidation workflow for RNDV pipeline protocol.

How?

Introduced a new option for uct_md_mem_reg_params_t: UCT_MD_MEM_WINDOW. Existing mem_reg call will create a memory window based on existing UCT memh
When memory window is requested, create a shallow copy of base UCT memh and take the ownership of its indirect memory keys.

src/uct/api/uct.h

src/uct/api/v2/uct_v2.h

src/uct/ib/mlx5/dv/ib_mlx5dv_md.c

brminich · 2024-11-28T12:48:09Z

src/uct/ib/mlx5/dv/ib_mlx5dv_md.c

+    base->atomic_dvmr   = NULL;
+    base->atomic_rkey   = UCT_IB_INVALID_MKEY;
+    base->indirect_dvmr = NULL;
+    base->indirect_rkey = UCT_IB_INVALID_MKEY;


why do we need to invalidate basic memh keys? I'd expect that this will break (at least performance) of some (at least atomics) operations on the base key

I think it should not hinder performance, because base memh rkeys will be recreated on next mkey_pack, like is done in my test.
But you are right, strictly speaking it's NOT necessary to transfer rkeys ownership to the derived memh. It will work even without this transfer. I will remove this ownership transfer logic

The reason why I added this transfer is the following:
Currently it's UCT layer that decides whether to create rkeys (atomic + indirect), and conditions for them are different:

atomic rkey is created if user requested it (with UCT_MD_MEM_ACCESS_REMOTE_ATOMIC), and hardware supports it. So it might be created no matter if invalidation is needed

indirect rkey is created only when invalidation is needed
Now from UCP layer we assume that we will create a memory window when invalidation if needed from EP config. By this time atomic rkeys on base memh might be already created and I thought it's a good idea to move it to the memory window, to avoid re-creating this key per MW.

I removed ownership transfer for now, let's consider this later

brminich · 2024-11-28T12:50:32Z

test/gtest/uct/ib/test_ib_md.cc

+
+    /* Test case 4: base memh can still be used to pack mkeys */
+    std::vector<uint8_t> base_rkey2 = mkey_pack(base, flags);
+    EXPECT_NE(base_rkey1, base_rkey2);


i do not think it is correct. base memh should not be modified once derived memh is created

ok, fixed this

tvegas1 · 2024-12-09T13:20:28Z

test/gtest/uct/ib/test_ib_md.cc

+    uct_mem_h mw1 = reg_mem_window(base);
+
+    /* Test case 2: creating MW from memh after mkey_pack */
+    std::vector<uint8_t> base_rkey1 = mkey_pack(base, flags);


do we need invalidation on first mkey pack of base?

do we need test with packing without invalidation first?

In terms of UCT invalidation would mean that we destroy derived memh. Currently I don't have tests with invalidation, actually it's a good point to add them.
Or I misunderstood your comment

src/uct/api/v2/uct_v2.h

src/uct/ib/base/ib_md.c

test/gtest/uct/ib/test_ib_md.cc

rakhmets · 2025-01-31T13:53:23Z

src/uct/cuda/cuda_ipc/cuda_ipc_md.c

@@ -372,9 +372,17 @@ static ucs_status_t
 uct_cuda_ipc_mem_reg(uct_md_h md, void *address, size_t length,
                     const uct_md_mem_reg_params_t *params, uct_mem_h *memh_p)
 {
+    uct_mem_h base = (params != NULL) ?


params is not checked to be non-NULL in other places. I think we can remove this check here as well.

Yes, initially I didn't have this check, added it to fix NULL pointer crash in CI

Do you remember the failed test? Looks like an issue in caller function.

I don't remember it, but I just see my commit named "Fix NPE" that I've made to address that failure: 53a4b08

Ok, found and fixed this NULL pointer error in gtest: b193dea

src/uct/cuda/cuda_ipc/cuda_ipc_md.c

yosefe · 2025-02-03T07:59:49Z

src/uct/api/v2/uct_v2.h

+     * A pointer to an existing memory handle.
+     * Used to register a derived memh: a shallow copy of an existing UCT memh
+     * which can be used to access the same memory region. When created, the
+     * derived memh inherits the access flags and the state of the original


can't we modify the access flags when creating a memory window? i would expect yes

I was not aware of this use case.
Do you propose to update this documentation? Or extend derived memh API so that we can create it with alternate access rights?

yosefe · 2025-02-03T08:00:15Z

src/uct/cuda/cuda_ipc/cuda_ipc_md.c

@@ -375,6 +375,10 @@ uct_cuda_ipc_mem_reg(uct_md_h md, void *address, size_t length,
    uct_cuda_ipc_memh_t *memh;
    CUdevice cu_device;

+    UCT_CHECK_PARAM((params == NULL) ||


IMO no need to check (params == NULL)

Same comment I got from Raul)
I added this check intentionally, because some unit test was triggering this call with NULL param.
Ok, let me remove it, so we see exactly that test

Ok, found and fixed this NULL pointer error in gtest: b193dea

yosefe · 2025-02-03T08:01:16Z

src/uct/cuda/cuda_ipc/cuda_ipc_md.c

@@ -375,6 +375,10 @@ uct_cuda_ipc_mem_reg(uct_md_h md, void *address, size_t length,
    uct_cuda_ipc_memh_t *memh;
    CUdevice cu_device;

+    UCT_CHECK_PARAM((params == NULL) ||
+                    UCT_MD_MEM_REG_FIELD_VALUE(params, memh, FIELD_MEMH, NULL) == NULL,
+                    "CUDA IPC does not support derived memory handles");


should we make this a common macro and use in other memory domains that dont support deriver memh?

Yes, we can
I think we can use this new macro only in those MDs which advertise INVALIDATION support, but do not properly handle it. These are namely: cuda_ipc_md and cma_md

yosefe · 2025-02-03T08:03:30Z

src/uct/ib/base/ib_md.c

-                               unsigned mem_flags, size_t memh_base_size,
-                               size_t mr_size, uct_ib_mem_t **memh_p)
+static uct_ib_mem_t *
+uct_ib_memh_alloc_internal(uct_ib_md_t *md, size_t memh_base_size,


can we make better names, instead of uct_ib_memh_alloc_internal+uct_ib_memh_alloc - uct_ib_memh_alloc+uct_ib_memh_new/init

uct_ib_memh_alloc is an existing function, so we don't want to change its name
I just extracted some common part from uct_ib_memh_alloc into _internal so that it can be reused by uct_ib_memh_clone.

Maybe we name it uct_ib_memh_alloc_common?

yosefe · 2025-02-03T08:04:35Z

src/uct/ib/base/ib_md.c

+    memh = uct_ib_memh_alloc_internal(md, memh_base_size, mr_size, &memh_size);
+    if (memh == NULL) {
+        return UCS_ERR_NO_MEMORY;
+    }
+
+    memcpy(memh, src, memh_size);


clone seems wrong - if the derived memh is shallow, why need to fully copy the original memh?

seems weird that we use calloc() and then override everything with memcpy

Answering in the opposite order:
2. Right, here malloc would be enough but I'm reusing existing function uct_ib_memh_alloc_internal that does calloc and calculates the size, just to reuse the common code. I think the overhead is minimal.
We could split size calculation into a separate function etc, but I think it's not worth the effort

Ok, maybe it's a terminology issue here
My understanding is:
Deep copy creates an independent instance of an object, that can be used apart from the original.
Shallow copy creates an "alias" that depends on master copy and cannot be used separately.
The latter is what's implemented in this PR, whatever we name it.

Derived memh is a shallow copy, because it makes a shallow copy of MRs - the most important part of original memh. And original object remains the only owner of the MRs state.
Of course there could be different implementations of derived memh. For example, we can imagine a shallow copy looking like that:

struct { uct_ib_mlx5_devx_mem_t *base; // derived specific fields } uct_derived_memh;

Looks ok, right? Still you need to allocate this object.
What are the issues with this approach:

There are places where we assume that memh has always the uct_ib_mem_t base: uct_rc_mlx5_txqp_tag_inline_post: ((uct_ib_mem_t*)iov->memh)
So we must add uct_ib_mem_t super field to our copy

The copy must handle a set of rkeys independently from the original memh, meaning that the following fields also needs to be copied:

struct mlx5dv_devx_obj *atomic_dvmr; struct mlx5dv_devx_obj *indirect_dvmr; uint32_t atomic_rkey; uint32_t indirect_rkey;

This is just to show you that we need to duplicate the significant part of the original memh anyway.

Then in each and every place we need to check whether passed memh is derived or original (because their layout is different), adding a lot of boilerplate code and CPU overhead..

Then we need to modify quite some existing functions for key generation, because they cache keys in the original object.

To overcome all these issues I make a shallow copy of the original memh (== they have the same memory layout), and use it interchangeably in all the places. This allows me to minimize the amount of ifs in the code, basically we check for derived handle only during init and cleanup. This helps to avoid any refactoring in the existing functions.

I think we nee to brainstorm the approaches. I don't like having structs where only some fields used, maybe need a deeper refactoring in the IB memh structure.

yosefe · 2025-02-03T08:05:43Z

src/uct/api/v2/uct_v2.h

+    /**
+     * Memory domain supports derived memory handle registration.
+     */
+    UCT_MD_FLAG_DERIVED        = UCS_BIT(13)


UCT_MD_FLAG_REG_DERIVED

yosefe · 2025-02-03T08:05:56Z

src/uct/ib/mlx5/dv/ib_mlx5dv_md.c

+                            const uct_ib_mlx5_devx_mem_t *src,
+                            uct_ib_mlx5_devx_mem_t **memh_p)
+{
+    size_t mr_size = src->super.flags & UCT_IB_MEM_IMPORTED ?


test/gtest/uct/cuda/test_cuda_ipc_md.cc

iyastreb added 3 commits November 26, 2024 10:18

UCT: Memory window

525c463

UCT: Added few assertions

15efe3e

UCT: Unit tests

4bb43d6

brminich reviewed Nov 28, 2024

View reviewed changes

tvegas1 reviewed Dec 9, 2024

View reviewed changes

iyastreb added 8 commits December 10, 2024 14:56

UCT: Addressed PR comments

ae21b04

UCT: cosmetic change

072de42

UCT: Merge branch 'master' into uct/memory-window

8885211

UCT: Fixed NPE

53a4b08

UCT: Fixed test

e121f20

UCT: Fixed test with atomic capability

c4fc840

UCT: Merge with UCX master branch

5a97574

UCT: Merge branch 'master' into uct/memory-window

0196e96

iyastreb mentioned this pull request Jan 9, 2025

UCP: Derived memh #10411

Open

rakhmets reviewed Jan 30, 2025

View reviewed changes

src/uct/api/v2/uct_v2.h Outdated Show resolved Hide resolved

src/uct/ib/base/ib_md.c Outdated Show resolved Hide resolved

test/gtest/uct/ib/test_ib_md.cc Outdated Show resolved Hide resolved

UCT: Addressed PR comments

4204a9c

rakhmets previously approved these changes Jan 31, 2025

View reviewed changes

UCT: Addressed PR comments

7ccaab7

iyastreb dismissed rakhmets’s stale review via 7ccaab7 January 31, 2025 14:44

yosefe reviewed Feb 3, 2025

View reviewed changes

iyastreb added 2 commits February 4, 2025 16:52

UCT: Addressed PR comments

e0faea0

UCT/GTEST: Fixed NPE in test

b193dea

iyastreb changed the title ~~UCT: Memory window~~ UCT: Derived memh Feb 5, 2025

iyastreb added 3 commits February 5, 2025 07:43

UCT: Merge with latest master branch

78a80ab

UCT: Added params check to avoid derived memh handling to uct_md

9340de1

UCT: derived memh support exported mkey_pack

6ebe17e

rakhmets reviewed Feb 5, 2025

View reviewed changes

test/gtest/uct/cuda/test_cuda_ipc_md.cc Outdated Show resolved Hide resolved

iyastreb added 3 commits February 5, 2025 13:25

UCT: use UCT API in cuda_ipc_md tests

58e9362

UCT/GTEST: Fixed export test

beb5107

UCT: replaced clone with copying

632f86e

iyastreb added 2 commits February 11, 2025 14:39

UCT: Minor change

334480b

UCT: Fix for coverity

077752c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UCT: Derived memh #10332

UCT: Derived memh #10332

iyastreb commented Nov 26, 2024 •

edited

Loading

brminich Nov 28, 2024

iyastreb Nov 29, 2024

iyastreb Dec 10, 2024

brminich Nov 28, 2024

iyastreb Dec 10, 2024

tvegas1 Dec 9, 2024

tvegas1 Dec 9, 2024

iyastreb Dec 10, 2024

rakhmets Jan 31, 2025

iyastreb Jan 31, 2025

rakhmets Jan 31, 2025

iyastreb Jan 31, 2025

iyastreb Feb 5, 2025

yosefe Feb 3, 2025

iyastreb Feb 4, 2025

yosefe Feb 3, 2025

iyastreb Feb 4, 2025

iyastreb Feb 5, 2025

yosefe Feb 3, 2025

iyastreb Feb 4, 2025

yosefe Feb 3, 2025

iyastreb Feb 4, 2025

yosefe Feb 3, 2025

iyastreb Feb 3, 2025 •

edited

Loading

yosefe Feb 3, 2025

yosefe Feb 3, 2025

iyastreb Feb 4, 2025

yosefe Feb 3, 2025

iyastreb Feb 4, 2025

UCT: Derived memh #10332

Are you sure you want to change the base?

UCT: Derived memh #10332

Conversation

iyastreb commented Nov 26, 2024 • edited Loading

What?

Why?

How?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iyastreb Feb 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iyastreb commented Nov 26, 2024 •

edited

Loading

iyastreb Feb 3, 2025 •

edited

Loading